-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modules: Move feasibility/satisfiability checking into a new module #1285
Draft
jacobtkeio
wants to merge
15
commits into
flux-framework:master
Choose a base branch
from
jacobtkeio:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1285 +/- ##
=======================================
Coverage 75.3% 75.4%
=======================================
Files 111 114 +3
Lines 15361 15526 +165
=======================================
+ Hits 11575 11708 +133
- Misses 3786 3818 +32
|
jacobtkeio
force-pushed
the
master
branch
3 times, most recently
from
December 2, 2024 20:57
c31f3e7
to
c6f6c95
Compare
Problem: sched.feasibility RPCs take up too much of sched-fluxion-resource's single-threaded time. Add CLI option to create a 'feasibility version' of sched-fluxion-resource called sched-fluxion-satisfiability that can run on multiple ranks.
Problem: Only one resource.acquire RPC can be active at a time, but both s-f-resource and s-f-satisfiability try to open one. Make s-f-satisfiability call s-f-resource.notify to populate its resources (kind of like s-f-qmanager does). Make s-f-resource send its resources instead of null on notify RPCs to accomodate s-f-satisfiability. Finally, force the FIRST matching policy for s-f-satisfiability.
Problem: Having a 'satisfiability version' of the s-f-resource module makes resource_match.cpp hard to read and less maintainable. Split resource_match.cpp into a resource.cpp and feasibility.cpp that contain their respective modules. Leave the code common to both in resource_match.cpp with a new header, resource_match.hpp. This simplifies adding new modules that acquire and match resources.
Problem: Tests segfault when unloading s-f-resource under flux-broker Create a sched-fluxion-resource-module library that loads the 'resource' target only once between both s-f-resource and s-f-feasibility. The previous CMakeLists.txt was loading the 'resource' target twice, which caused a segfault during unload when running under 'flux broker' on static variable ANY_RESOURCE_TYPE{"*"} in matcher.cpp.
Problem: The feasibility module ignores s-f-resource after it gets resources from s-f-resource.notify, but it should exit when s-f-resource does to prevent odd behavior after an s-f-resource reload, especially with different resources. Make s-f-feasibility listen for errors on the .notify stream. Send graph expiration time to feasibility module Problem: s-f-resource.notify does not currently send graph expiration info to s-f-feasibility. Send it. Fix several errors Problem: Improper use of git including forgetting to add a file in the previous commit and losing changes during a merge. Fix various things including broken type of m_acquired_resources and unnecessary+broken code in resource_match_opts.
Problem: s-f-feasibility launches on all ranks by default. This is probably not a good default behavior. Change rc1.d/01-s-f to launch s-f-feasibility on only rank 0. Change rc3.d/01-s-f to remove s-f-feasibility from all ranks. This allows for any layout of s-f-feasibility instances while guaranteeing at least one.
Problem: notify_request_cb waits for resource.acquire if it has not yet recieved resources. However, it is guaranteed to have resources since init_resource- _graph must return before flux_reactor_run starts, and notify_request_cb can only happen after that. Remove the check.
Problem: s-f-feasibility performs feasibility checking through the sched.feasibility RPC. However, RFC 27 requires feasibility checking to be performed in feasibility.check. Make s-f-feasibility register the 'feasibility' service and 'feasibility.check' RPC instead of the 'sched.feasibility' RPC. Remove 'sched.feasibility' forwarder cb in qmanager.cpp.
Problem: Some tests call 'sched.feasibility', which no loger exists. Swap 'sched.feasibility' for 'feasibility.check'. Add feasibility module to sched-sharness and t1020 Problem: Some tests that need satisfiability information do not load and unload s-f-feasibility. Add load_feasibility, reload_feasibility, and remove_feasibility functions to sched-sharness.sh and the relevant tests.
Problem: All calls to feasibility.check are in tests for s-f-resource, not tests for s-f-feasibility. Move them to a separate test for the feasibility module. Disallow load-file behavior in feasibility test Problem: t4014 expects feasibility to load resources from a resource module that was passed a load-file, which is not desired behavior. Update t4014 to expect failure on such a load.
Problem: The formatting did not pass the CI code formatting check. Apply the required changes.
Problem: As a holdover from s-f-resource, s-f-feasibility marks its acquired resources as DOWN, which is unnecessary. Remove this resource marking from init_resource_graph.
Problem: If notify_request_cb fails flux_respond_pack, it responds with nothing, leaving feasibility without resources but active. Make notify_request_cb send a flux_respond_error on a flux_respond_pack_error.
Problem: fedora40 CI fails on t4014 due to """ error: bug in the test script: broken &&-chain: load_feasibility flux dmesg -c | grep -q "File exists" """ Add && between successive statements in t4014.
jacobtkeio
force-pushed
the
master
branch
4 times, most recently
from
December 20, 2024 21:27
93b13a6
to
3d94893
Compare
jacobtkeio
force-pushed
the
master
branch
4 times, most recently
from
January 1, 2025 01:07
c93a773
to
bcfbb92
Compare
jacobtkeio
force-pushed
the
master
branch
5 times, most recently
from
January 7, 2025 09:11
1b1a544
to
510b18f
Compare
jacobtkeio
force-pushed
the
master
branch
2 times, most recently
from
January 8, 2025 06:57
73523f2
to
6308980
Compare
Also reload s-f-feasibility during resource load
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem: Feasibility checking takes up a significant amount of sched-fluxion-resource's time that could be better spent on scheduling. Additionally, feasibility checking is not RFC 27 compliant.
Solution: Move feasibility checking into a new module, sched-fluxion-feasibility, that can run on multiple ranks and make it RFC 27 compliant.